{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div align=\"center\">\n",
    " <font size=\"5\" > **基于机器学习的工程机械设备故障预测系统Part3 - 测试集Leak**</font>\n",
    "</div>\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "在前两章，我们给出了针对该竞赛常见的分析于可视化的方法以及根据我们的EDA如何进行特征工程,设计验证集,并通过结合规则&模型得到一个非常好的成绩。\n",
    "\n",
    "在本篇文章中，我们介绍一种探索测试集label的方法,虽然很少会碰到,但是也是作为数据竞赛者一种必备的直觉,也有必要了解一下。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Leak-观测\" data-toc-modified-id=\"Leak-观测-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Leak 观测</a></span><ul class=\"toc-item\"><li><span><a href=\"#测试集观察-&amp;-预测(线上1.0)\" data-toc-modified-id=\"测试集观察-&amp;-预测(线上1.0)-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>测试集观察 &amp; 预测(线上1.0)</a></span></li><li><span><a href=\"#根据群友给的每个单独的结果计算的值计算出所有的label的个数\" data-toc-modified-id=\"根据群友给的每个单独的结果计算的值计算出所有的label的个数-1.2\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>根据群友给的每个单独的结果计算的值计算出所有的label的个数</a></span><ul class=\"toc-item\"><li><span><a href=\"#模型提交\" data-toc-modified-id=\"模型提交-1.2.1\"><span class=\"toc-item-num\">1.2.1&nbsp;&nbsp;</span>模型提交</a></span></li></ul></li></ul></li><li><span><a href=\"#小结\" data-toc-modified-id=\"小结-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>小结</a></span></li></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Leak 观测\n",
    "## 测试集观察 & 预测(线上1.0)\n",
    "\n",
    "我们发现我们的预测结果存在非常大的规律，有极大的聚类情况，1182分布在最下面,1141除了5%不到的分布在了10000附近，95%都分布在后半段;1239在1141前面一点，1174因为没有出现在我们的预测中，我们随机将其放到最后。\n",
    "\n",
    "![](./pic/suspect.png)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZUAAAELCAYAAAARNxsIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3XucVXW9//HXWxHvCOqoBCiok+IlESbE7PhTMURT0XOyNC9kFj8NS08dS7t5q192ulie1KIkwVOZWSY/j0qE2sVEGe8gGBOZTJpMgUqiIvA5f6zvOFv2HmaAtfeaDe/n47Eee63P+q41n7Vg+LDW97vXUkRgZmaWh82KTsDMzDYeLipmZpYbFxUzM8uNi4qZmeXGRcXMzHLjomJmZrlxUTEzs9y4qJiZWW5cVMzMLDe9ik6g1nbeeecYPHhw0WmYmdWNhx9++O8R0dCdtptcURk8eDDNzc1Fp2FmVjck/aW7bX37y8zMcuOiYmZmuXFRMTOz3LiomJlZblxUzMwsNy4qZmaWGxcVMzPLjYuKmZnlZpP78mORVq1axV6fuxuAWRcdyW47bVNwRgW7cjcg4AsvFJ2J9RDz9h36luWh8+cVlImtL1+p1MicRW1vFhSAUV+7l9OvvbfAjAr0x3vgsh1g1auw6rVsvvnmorOygq1ZUDqLbUquPfcevnvBPUWnsU5cVGrk+GsfKovdv2h5AZn0AD8+uTx2x/+tfR5mPdR/f+Eerj03KyarXufN+XpQtaIiaR9Jj5VML0u6UNKOkmZIWpA++6X2knSNpBZJT0gaXrKv8an9AknjS+IjJD2ZtrlGkqp1PGZmtfJSW3msXgpL1YpKRDwdEcMiYhgwAlgO3AZcDMyMiEZgZloGOBZoTNME4HoASTsClwKHACOBS9sLUWozoWS7sdU6HjMz61qtbn+NBv4UEX8BxgFTUnwKcFKaHwdMjcwsoK+k/sAxwIyIWBIRS4EZwNi0rk9EPBARAUwt2Zf1ZBN+Wx4b/z+1z8N6loEDy0I7nOvbovWmVkXlVOAnaX7XiHgeIH3ukuIDgEUl27Sm2NrirRXiPdIzV723W7FNwtsOgn+fCzs1wo57wQWPw5B3F52VFWzor2fAnnu+udz7yCN524UXFpiRrY+qDymW1Bs4Ebikq6YVYrEe8Uo5TCC7Tcbuu+/eRRrVs8kWkUp2GAgf93tt7K2G3ukr1npXiyuVY4FHIqL9ywgvpFtXpM/FKd4KDCrZbiDwXBfxgRXiZSJiUkQ0RURTQ0O3Xl5mZmbroRZF5TQ6bn0BTAPaR3CNB24viZ+VRoGNAl5Kt8emA2Mk9Usd9GOA6WndMkmj0qivs0r2ZWZmBajq7S9J2wDvAUp7264CbpF0DvAscEqK3wkcB7SQjRQ7GyAilki6Epid2l0REUvS/HnAjcDWwF1pMjOzglS1qETEcmCnNWL/IBsNtmbbACZ2sp/JwOQK8WbggFySNTOzDeZnf5lZj/GWx7L07s3QJx4vLhlbL35Mi5n1CGXP+VqxYpN/9lc9clGxYix9FqaMgxuPh38sLDobM8uJb39Z7bUtgGubOpb/62A49/ew24HF5WRmufCVitXe7eeVx375sdrnYVZHdn5H0Rl0j4uK1d7K18tjqyrEzDZRE7971FuW337EdnzgY0d10rpn8e0vq71jvwY/POatsbFXFZOLWQ+1ZmGpF75SsdrbYxRsuUPHcu/tYK/6/AUys7dyUbHam3oyvP5Sx/KKf8L3jy4uHzPLjYuK1d7CCm+w++vs8piZ1R0XFSuA3/pstrFyUbHaO/SC8tiID9c+DzPLnYuK1d4xl8OQIzuWd383nHB1cfmYWW48pNiKMf6X8MJTEKthNz9o2mxj4aJitbdqJXytEV5Lr8XZsg9c9Cfo1bvYvMxsg/n2l9Xe94/qKCgAr78M140qLh8zy42LitXe3yq8I2PJn2qfh5nlzkXFzMxyU9WiIqmvpFslzZc0T9KhknaUNEPSgvTZL7WVpGsktUh6QtLwkv2MT+0XSBpfEh8h6cm0zTWS/AWIerD/B8pje42pfR5mlrtqX6l8G7g7IvYFDgLmARcDMyOiEZiZlgGOBRrTNAG4HkDSjsClwCHASODS9kKU2kwo2W5slY/H8nDKJNj/lI7lxuPhzJ8Vl4+Z5aZqo78k9QEOBz4EEBErgBWSxgFHpGZTgPuAzwDjgKkREcCsdJXTP7WdERFL0n5nAGMl3Qf0iYgHUnwqcBJwV7WOyXJ0yg+yycw2KtW8UtkTaAN+KOlRST+QtC2wa0Q8D5A+d0ntBwCLSrZvTbG1xVsrxM3MrCDVLCq9gOHA9RFxMPAKHbe6KqnUHxLrES/fsTRBUrOk5ra2trVnbWZm662aRaUVaI2IB9PyrWRF5oV0W4v0ubik/aCS7QcCz3URH1ghXiYiJkVEU0Q0NTQ0bNBBmZlZ56pWVCLib8AiSfuk0GjgKWAa0D6Cazxwe5qfBpyVRoGNAl5Kt8emA2Mk9Usd9GOA6WndMkmj0qivs0r2ZfXgjdfgjVeLzsLMclTtx7R8HPiRpN7AQuBsskJ2i6RzgGeB9mFAdwLHAS3A8tSWiFgi6Uqg/YUbV7R32gPnATcCW5N10LuTvl7cfQnMuj6bb/owvPcb4BHhZnWvqkUlIh4DmiqsGl2hbQATO9nPZGByhXgz4KcR1puHb4JZ13UsN98AO+4F76r4x29mdcTfqLfa+//nl8d+9dna52FmuXNRMTOz3LiomJlZblxUzMwsNy4qVnt7HFUee9vI2udhZrlzUbHae+9XymMnfKP2eZhZ7vw6Yau9XfaF02+FP1wDEXDoROj/jqKzMrMcuKhYMRrfk01mtlHx7S8zM8uNi4qZmeXGRcXMzHLjomJmZrlxUTEzs9x49JcV49vDYOmfs/k+g+CTc4rNx8xy4SsVq72ff7SjoAC8vAh+fGpx+ZhZblxUrPbm31EeW3hP7fMws9y5qFjtbblDeWyL7Wufh5nlzkXFaq/3duWxXlvXPg8zy11Vi4qkZyQ9KekxSc0ptqOkGZIWpM9+KS5J10hqkfSEpOEl+xmf2i+QNL4kPiLtvyVt65ec14Md9+hezMzqTi2uVI6MiGER0f6u+ouBmRHRCMxMywDHAo1pmgBcD1kRAi4FDgFGApe2F6LUZkLJdmOrfzi2wUacUx4bdkbt8zCz3BVx+2scMCXNTwFOKolPjcwsoK+k/sAxwIyIWBIRS4EZwNi0rk9EPBARAUwt2Zf1ZJXeUX/3Z2qfh5nlrtpFJYBfSXpY0oQU2zUingdIn7uk+ABgUcm2rSm2tnhrhbj1dKtWlsdWV4iZWd2pdlE5LCKGk93amijp8LW0rdQfEusRL9+xNEFSs6Tmtra2rnK2ajvu6+Wxo6+ofR5mlruqFpWIeC59LgZuI+sTeSHduiJ9Lk7NW4FBJZsPBJ7rIj6wQrxSHpMioikimhoaGjb0sGxDHfR+GPtV6LUVbL4lHH0lHPKRorMysxxUrahI2lbS9u3zwBhgDjANaB/BNR64Pc1PA85Ko8BGAS+l22PTgTGS+qUO+jHA9LRumaRRadTXWSX7sp5u1Lnw+RfgC4vh3Z8oOhszy0k1n/21K3BbGuXbC/hxRNwtaTZwi6RzgGeBU1L7O4HjgBZgOXA2QEQskXQlMDu1uyIilqT584Abga2Bu9JkZmYFqVpRiYiFwEEV4v8ARleIBzCxk31NBiZXiDcDB2xwsmZmlgt/o97MzHLjomJmZrlxUTEzs9y4qJiZWW5cVMzMLDcuKmZmlhsXFTMzy42LipmZ5cZFxczMcuOiYmZmuXFRMTOz3LiomJlZblxUzMwsNy4qZmaWGxcVMzPLjYuKmZnlxkXFzMxy46JiZma5qXpRkbS5pEcl3ZGWh0h6UNICST+V1DvFt0zLLWn94JJ9XJLiT0s6piQ+NsVaJF1c7WMxM7O1q8WVygXAvJLlrwJXR0QjsBQ4J8XPAZZGxN7A1akdkvYDTgX2B8YC16VCtTlwLXAssB9wWmprZmYFqWpRkTQQeC/wg7Qs4Cjg1tRkCnBSmh+XlknrR6f244CbI+L1iPgz0AKMTFNLRCyMiBXAzamtmZkVpNpXKt8CPg2sTss7AS9GxMq03AoMSPMDgEUAaf1Lqf2b8TW26SxuZmYFqVpRkXQ8sDgiHi4NV2gaXaxb13ilXCZIapbU3NbWtpaszcxsQ1TzSuUw4ERJz5DdmjqK7Mqlr6Reqc1A4Lk03woMAkjrdwCWlMbX2KazeJmImBQRTRHR1NDQsOFHZmZmFVWtqETEJRExMCIGk3W03xMRpwP3Au9LzcYDt6f5aWmZtP6eiIgUPzWNDhsCNAIPAbOBxjSarHf6GdOqdTxmZta1Xl03yd1ngJslfQl4FLghxW8AbpLUQnaFcipARMyVdAvwFLASmBgRqwAknQ9MBzYHJkfE3JoeiZmZvYWyi4FNR1NTUzQ3NxedhpmtYd6+QyvGh86fVzFutSPp4Yho6k5bf6PezMxy063bX5KWUT6y6iWgGfhURCzMOzEzM6s/3e1T+SbZyKofkw3lPRXYDXgamAwcUY3kzMysvnT39tfYiPheRCyLiJcjYhJwXET8FOhXxfzMzKyOdLeorJb0fkmbpen9Jes2rZ5+MzPrVHeLyunAmcBi4IU0f4akrYHzq5SbmZnVmW71qaSO+BM6Wf37/NIxM7N61t3RXw3AR4HBpdtExIerk5aZmdWj7o7+uh34HfBrYFX10jEzs3rW3aKyTUR8pqqZmJlZ3etuR/0dko6raiZmZlb3urxSSW9f/Gyafx14g+wLkBERfaqbnm20/rEQZl4OsRqO+iI0NBadkZnloMuiEhEh6bGIGF6LhGwTsPQZ+E4TROqem38HfOxBaHh7oWmZ2Ybr7u2vByS9s6qZ2KZj5pUdBQWyq5WZlxeXj5nlprsd9UcC56a3OL5Cx+2vd1QrMTMzqz/dLSrHVjUL27SM/gLMva3jakWbw+jLCk3JzPLR3W/U/6XaidgmpN9g+MQj8OvUUT/6i7DTXkVnZWY5KOJ1wmZZYTnlh0VnYWY5q9qbHyVtJekhSY9Lmivp8hQfIulBSQsk/VRS7xTfMi23pPWDS/Z1SYo/LemYkvjYFGuRdHG1jsXMzLqnmq8Tfh04KiIOAoYBYyWNAr4KXB0RjcBS4JzU/hxgaUTsDVyd2iFpP7KXgu0PjAWuk7S5pM2Ba8n6e/YDTkttzcysIFUrKpH5Z1rcIk0BHAXcmuJTgJPS/Li0TFo/On3xchxwc0S8HhF/BlqAkWlqiYiFEbECuDm1NTOzglTzSoV0RfEY2XtYZgB/Al6MiJWpSSswIM0PABYBpPUvATuVxtfYprO4mZkVpKpFJSJWRcQwYCDZlcXQSs3SpzpZt67xMpImSGqW1NzW1tZ14mZmtl6qWlTaRcSLwH3AKKCvpPZRZwOB59J8KzAIIK3fAVhSGl9jm87ilX7+pIhoioimhoaGPA7JzMwqqOborwZJfdP81sDRwDzgXuB9qdl4sne1AExLy6T190REpPipaXTYEKAReAiYDTSm0WS9yTrzp1XreMzMrGvV/J5Kf2BKGqW1GXBLRNwh6SngZklfAh4FbkjtbwBuktRCdoVyKkBEzJV0C/AUsBKYGJF9FVvS+cB0YHNgckTMreLxmJlZF5RdDGw6mpqaorm5ueg0zGwN8/at1OUKQ+fPq3EmtiZJD0dEU3fa1qRPxczMNg0uKmZmlhsXFTMzy42LipmZ5cZFxczMcuOiYmZmuXFRMTOz3LiomJlZblxUzMwsNy4qZmaWGxcVMzPLjYuKmZnlxkXFzMxy46JiZma5cVExM7PcuKiYmVluXFTMzCw3LipmZpabqhUVSYMk3StpnqS5ki5I8R0lzZC0IH32S3FJukZSi6QnJA0v2df41H6BpPEl8RGSnkzbXCNJ1ToeMzPrWjWvVFYCn4qIocAoYKKk/YCLgZkR0QjMTMsAxwKNaZoAXA9ZEQIuBQ4BRgKXthei1GZCyXZjq3g8ZmbWhaoVlYh4PiIeSfPLgHnAAGAcMCU1mwKclObHAVMjMwvoK6k/cAwwIyKWRMRSYAYwNq3rExEPREQAU0v2ZWZmBahJn4qkwcDBwIPArhHxPGSFB9glNRsALCrZrDXF1hZvrRA3M7OCVL2oSNoO+DlwYUS8vLamFWKxHvFKOUyQ1Cypua2trauUzcxsPVW1qEjagqyg/CgifpHCL6RbV6TPxSneCgwq2Xwg8FwX8YEV4mUiYlJENEVEU0NDw4YdlJmZdaqao78E3ADMi4hvlqyaBrSP4BoP3F4SPyuNAhsFvJRuj00HxkjqlzroxwDT07plkkaln3VWyb7MzKwAvaq478OAM4EnJT2WYp8FrgJukXQO8CxwSlp3J3Ac0AIsB84GiIglkq4EZqd2V0TEkjR/HnAjsDVwV5rMzKwgVSsqEfF7Kvd7AIyu0D6AiZ3sazIwuUK8GThgA9I0M7Mc+Rv1ZmaWGxcVMzPLTTX7VMw6t+wFuP/bEKvgsAuhT/+iMzKzHLioWO0tewG+dSCsej1bbp4MH38E+g5a+3Zm1uP59pfV3j1XdBQUgFUr4NeXF5ePmeXGRcVq77Vl5bEVa3vYgpnVCxcVq73DP0XZaPN/uaiQVMwsXy4qVnv9D4Izfwn9D4bdDoIP3gqD3ll0VmaWA3fUWzH2OgL2uq/oLMwsZ75SMTOz3LiomJlZblxUzMwsNy4qZmaWGxcVMzPLjYuKmZnlxkXFzMxy46JiZma5cVExM7PcVK2oSJosabGkOSWxHSXNkLQgffZLcUm6RlKLpCckDS/ZZnxqv0DS+JL4CElPpm2ukdTZq4t7jNseaWW/L97Fvp+/i+/9pqXodMzMclfNK5UbgbFrxC4GZkZEIzAzLQMcCzSmaQJwPWRFCLgUOAQYCVzaXohSmwkl2635s3qUBxb8nX+/5XGWr1jNaytX85W7nua6e11YzGzjUrWiEhG/BZasER4HTEnzU4CTSuJTIzML6CupP3AMMCMilkTEUmAGMDat6xMRD0REAFNL9tUjffynj5bFvj1zQQGZmJlVT637VHaNiOcB0ucuKT4AWFTSrjXF1hZvrRDvsXptVn53rkLIzKyu9ZSO+kr/vMZ6xCvvXJogqVlSc1tb23qmuGF+cFZTWeyyE/cvIJMe4o3X4adnws2nwxuvFp2NmeWk1kXlhXTrivS5OMVbgdIXlA8EnusiPrBCvKKImBQRTRHR1NDQsMEHsT4OGNiXaRPfRcP2vem3zRZ8/8wRfOCduxeSS+FeXARf3gXmTYP5d8CXd4M23wo02xjUuqhMA9pHcI0Hbi+Jn5VGgY0CXkq3x6YDYyT1Sx30Y4Dpad0ySaPSqK+zSvbVY/XZegv22HFbBvXbhl37bFV0OsX50b+Vx378/trnYWa5q9pLuiT9BDgC2FlSK9korquAWySdAzwLnJKa3wkcB7QAy4GzASJiiaQrgdmp3RUR0d75fx7ZCLOtgbvS1GO1Ln2FI77+mzeXT7z2fn70kZEctncxV06FWr60PPbai7XPw8xyV7WiEhGndbJqdIW2AUzsZD+TgckV4s3AARuSYy19+tYnymKfv20O9150ZAHZFOzA98Os77w1ts/xxeRiZrnqKR31G72Vq8vHEaxaXUAiPcGSheWxZa3lMTOrOy4qNfLlkw8si33xhKEFZNID9BtUHtuhQszM6o6LSo007rI9v/zYu9hnt+3Zc+dt+OGH3snR++1WdFrFOPIL5bGjL6t1FmZWBVXrU7Fyw3bvx/QLDy86jeJdNbA89p9D4LKXap+LmeXKVypmZpYbFxUzM8uNi4qZmeXGRcXMzHLjomJmZrlxUbHaO3lSeezE/6p9HmaWOxcVq73lfy+PvVIhZmZ1x0XFam/oidB7u47lLbaB/Xv0izvNrJv85Uervb6D4JwZ8NAkiNUw8qOw455FZ2VmOXBRsWLsuh+c8K2iszCznLmoWDFe+is8ehNEwMGnQ99N9C2YZhsZFxWrvX+2waT/A6+0ZcsPTYLz/gB9+hebl5ltMHfUW+3Nva2joAC8ugTm3FpcPmaWGxcVq70ttiqP9aoQM7O6U/dFRdJYSU9LapF0cdH5WDfs/6/QsG/H8k57wzveX1w+Zpabuu5TkbQ5cC3wHqAVmC1pWkQ8VWxmtlZbbgcT7oOn78w66vc5DnpvU3RWhRl+03DeWP0GAFttvhWzz5hdcEZm66/er1RGAi0RsTAiVgA3A+MKzsm6Y4ut4YB/gwPft0kXlBN+ccKbBQXgtVWvccadZxSYkdmGqfeiMgBYVLLcmmJmdeHZZc+Wxeb+fW4BmZjlo96LiirEoqyRNEFSs6Tmtra2CpuYFWPXbXcti+25g58uYPWr3otKKzCoZHkg8NyajSJiUkQ0RURTQ0NDzZIz68qv3vcrNiv5NeylXvx83M8LzKg4+zxVfoU2dP68AjKxDVHXHfXAbKBR0hDgr8CpwAeLTcls3Tw+/nFeW/Eam2kzem/Ru+h0CrPZZpsxdP48IgKp0k0Iqwd1XVQiYqWk84HpwObA5IjwDWmrO1v19vd02rmg1Le6LioAEXEncGfReZiZWf33qZiZWQ/iomJmZrlxUTEzs9y4qJiZWW5cVMzMLDcuKmZmlhsXFTMzy40iyh6VtVGT1Ab8peA0dgb+XnAOPYXPRQefiw4+Fx16wrnYIyK69YyrTa6o9ASSmiOiqeg8egKfiw4+Fx18LjrU27nw7S8zM8uNi4qZmeXGRaUYk4pOoAfxuejgc9HB56JDXZ0L96mYmVlufKViZma5cVHJgaTJkhZLmlMSO0XSXEmrJZWN3JC0u6R/SvqPte2n3qzLuZB0uqTHSqbVkoatsb9p9Xo+1vFcbCFpiqQnJc2TdEmKD5J0b4rNlXRBEceyoTo5F1+TNF/SE5Juk9S3ZN0lklokPS3pmJJ4X0m3pu3mSTq01seyodblXEgaWfL78bikk0u2uUDSnPT34sIijqWiiPC0gRNwODAcmFMSGwrsA9wHNFXY5ufAz4D/WNt+6m1an3OR2hwILFwj9q/Aj+v1fKzLuSB7Y+nNaX4b4BlgMNAfGJ7i2wN/BPYr+thyOhdjgF5p/qvAV9P8fsDjwJbAEOBPwOZp3RTgI2m+N9C36GOr8rnYpiTeH1hM9h6sA4A57euBXwONRR9bRPhKJQ8R8VtgyRqxeRHxdKX2kk4CFgJz19imbD/1Zl3PRYnTgJ+0L0jaDvgk8KXck6yRdTwXAWwrqRewNbACeDkino+IR9K2y4B5wIDqZp6/Ts7FryJiZVqcBQxM8+PICuzrEfFnoAUYKakP2T/IN6TtV0TEizU5gByty7mIiOUl8a3I/p5A9p+TWSXrfwOcTA/golJjkrYFPgNcXnQuPcwHKCkqwJXAN4DlxaRTc7cCrwDPA88CX4+It/zDI2kwcDDwYK2Tq4EPA3el+QHAopJ1rSm2J9AG/FDSo5J+kH6fNjal5wJJh0iaCzwJnJuKyBzgcEk7SdoGOA4YVEi2a3BRqb3Lgasj4p9FJ9JTSDoEWB4Rc9LyMGDviLit2MxqaiSwCngb2S2fT0nas31lunL7OXBhRLxcTIrVIelzwErgR+2hCs2C7DbPcOD6iDiYrAhfXJMka6TCuSAiHoyI/YF3ApdI2ioi5pHdJpsB3E12u3BlhV3WnItK7R0C/KekZ4ALgc9KOr/YlAp3Km+9SjkUGJHO0e+Bt0u6r4C8aumDwN0R8UZELAbuB5og68QnKyg/iohfFJhj7iSNB44HTo/UcUB2ZVL6v+6BwHMp3hoR7Vdqt5IVmY1CJ+fiTamQvELWn0JE3BARwyPicLLbaQtqmW9nXFRqLCL+JSIGR8Rg4FvA/4uI7xScVmEkbQacAtzcHouI6yPibekcvRv4Y0QcUUyGNfMscJQy2wKjgPmSRNaHMC8ivllohjmTNJbsVvCJEVF6m3MacKqkLSUNARqBhyLib8AiSfukdqOBp2qadJV0di4kDUn9bEjag2yQxzNpeZf0uTvZoJaf0BMUPVJgY5jI/jCfB94g+9/UOWSdZq3A68ALwPQK213GW0d/le2n6GOr9rkAjiDrcOxsf4Op39Ff3T4XwHZkowHnkv1DeVGKv5vs1s8TwGNpOq7oY8vpXLSQ9Z20H9d3S9p/jmzU19PAsSXxYUBzOh+/BPoVfWzVPBfAmenvxGPAI8BJJfv5Xfq78jgwuujjap/8jXozM8uNb3+ZmVluXFTMzCw3LipmZpYbFxUzM8uNi4qZmeXGRcXMzHLjomLWBUmr0qPH50j6WXrW0vru6whJd3TR5lhJzenR7vMlfX0df4YfAWSFcVEx69qrETEsIg4ge3rwuaUr07fgc/ldknQA8B3gjIgYSvZIjoV57NusFlxUzNbN74C9JQ1OVxLXkX3TeZCkMZIekPRIuqLZDrJHcKQrjt+TPU5jbT4NfDki5gNExMqIuC7tZw9JM9OLnGamx3O0P8rjAUmzJV1ZujNJF6X4E5L8ZGyrOhcVs25Kz2A6luwR5JA9h2lqdDwx9/PA0RExnOxRIp+UtBXwfeAE4F+A3br4MQcAD3ey7jvp572D7Cm216T4t8me3PtO4G8l+Y4he27WSLLHm4yQdHj3j9hs3bmomHVta0mPkRWKZ0kviQL+EhGz0vwosjcW3p/ajgf2APYF/hwRCyJ7JtJ/b0Aeh5K9CRPgJrLnggEcRsfDBG8qaT8mTY+SXU3tS1ZkzKqmV9EJmNWBVyNiWGkge3gwr5SGgBkRcdoa7YbR8ba+7pgLjCB7SGBXopP50py+EhHfW4efb7ZBfKVilo9ZwGGS9gaQtI2ktwPzgSGS9krtTutsB8nXyN6x8/a0n80kfTKt+wPZu2cATid71wxk714pjbebDny4pG9nQPvj0s2qxUXFLAcR0QZ8CPiJpCfIisy+EfEaMAH4n9RR/5cu9vME2cvbfiJpHtlrY/un1Z8Azk77PxO4IMUvACZKmg3sULKvX5HdLntA0pNkL7XaPofDNeuUH31vZma58ZWKmZnlxh31ZgWQdDYdt6/a3R8RE4vIxywvvv1lZma58e0vMzPLjYuKmZnlxkXFzMz3ehJfAAAAE0lEQVRy46JiZma5cVExM7Pc/C/Ylg+cCIKdZwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7fcd6bdbecc0>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.stripplot(x=\"Pred_Code\", y=\"rng\", data=test_data);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 数量太少了，可能算错了，大部分还是分布在后面"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(8, 16)"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_data.loc[((test_data.Pred_Code == 1141) & (test_data.rng <= 20000))].shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 根据群友给的每个单独的结果计算的值计算出所有的label的个数\n",
    "\n",
    "按照下面的方法进行排序！\n",
    "\n",
    "- 1182： 6个(最前面6个,杰神朋友给的)\n",
    "- 1206： 67048个\n",
    "- 1239： 3952个\n",
    "- 1141： 431个 最后122个往前瞬移\n",
    "- 1174： 最后122个"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from sklearn.metrics import f1_score  \n",
    "test_data['SPN_FMI_Code'] = 1206\n",
    "test_data.iloc[:6]['SPN_FMI_Code'] = 1182\n",
    "test_data.iloc[6:6+67048]['SPN_FMI_Code'] = 1206\n",
    "test_data.iloc[6 + 67048 : 6 + 67048 + 3952]['SPN_FMI_Code'] = 1239\n",
    "test_data.iloc[6 + 67048 + 3952:6 + 67048 + 3952 + 431]['SPN_FMI_Code'] = 1141\n",
    "test_data.iloc[6 + 67048 + 3952+ 431:]['SPN_FMI_Code'] = 1174  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_data['SPN_FMI_Code'] = 1206\n",
    "test_data.ix[:6]['SPN_FMI_Code'] = 1182\n",
    "test_data.ix[6:6+67048,'SPN_FMI_Code'] = 1206\n",
    "test_data.ix[6 + 67048 : 6 + 67048 + 3952,'SPN_FMI_Code'] = 1239\n",
    "test_data.ix[6 + 67048 + 3952:6 + 67048 + 3952 + 431,'SPN_FMI_Code'] = 1141\n",
    "test_data.ix[6 + 67048 + 3952+ 431:,'SPN_FMI_Code'] = 1174  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1206    67048\n",
       "1239     3952\n",
       "1141      431\n",
       "1174      122\n",
       "1182        6\n",
       "Name: SPN_FMI_Code, dtype: int64"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_data['SPN_FMI_Code'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 模型提交"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_data['Pred'] = test_data['SPN_FMI_Code'].values\n",
    "test_data[['idx','Pred']].to_csv('1.csv',index =None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1206    67048\n",
       "1239     3952\n",
       "1141      431\n",
       "1174      122\n",
       "1182        6\n",
       "Name: Pred, dtype: int64"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_data['Pred'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 小结\n",
    "\n",
    "通过观察测试集的预测分布，有的时候我们可以抓住出题方因为一些细节原因造成的leak,不过因为此次比赛后来又重新进行了复赛，复赛中leak被修复了，但是这种对于测试预测的观察还是一种很有趣的发现。\n",
    "\n",
    "数据竞赛很多时候还是看线上得分的,对训练集&验证集的探索是必备的，但是很多时候对测试集的预测进行某种程度的分析,往往也会有意想不到的收获。\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "349px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
